Skip to content

Add repair_run action to manage_job_runs MCP tool#444

Merged
calreynolds merged 2 commits intomainfrom
fix/repair-job-run
Apr 13, 2026
Merged

Add repair_run action to manage_job_runs MCP tool#444
calreynolds merged 2 commits intomainfrom
fix/repair-job-run

Conversation

@jacksandom
Copy link
Copy Markdown
Collaborator

Summary

  • Adds repair action to manage_job_runs MCP tool, enabling retry of only failed tasks instead of re-running entire jobs
  • Implements repair_run() core function following the existing run_job_now pattern
  • Supports rerun_all_failed_tasks, rerun_dependent_tasks, rerun_tasks, and latest_repair_id

Closes #392

Problem

When a job run had failed tasks, the LLM had no way to repair the run via MCP. The repair action was missing from manage_job_runs, so attempts fell through to a ValueError. The LLM then fell back to run_now, re-running all tasks (including successful ones), wasting compute and time.

Changes

File Change
databricks-tools-core/.../jobs/runs.py New repair_run() function
databricks-tools-core/.../jobs/__init__.py Export repair_run
databricks-mcp-server/.../tools/jobs.py Import, 4 new params, "repair" action dispatch with error handling
databricks-tools-core/tests/.../conftest.py failing_notebook_path fixture
databricks-tools-core/tests/.../test_runs.py TestRepairRun class

Test plan

  • All 11 integration tests pass (including new TestRepairRun)
  • MCP smoke test: repair with rerun_all_failed_tasks=True returns repair_id
  • MCP smoke test: chained repair with latest_repair_id
  • Existing tests unaffected

@jacksandom jacksandom requested a review from calreynolds April 10, 2026 08:52
Copy link
Copy Markdown
Collaborator

@calreynolds calreynolds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfecto!

@calreynolds calreynolds merged commit ee45a01 into main Apr 13, 2026
@jacksandom jacksandom deleted the fix/repair-job-run branch April 13, 2026 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: repair_run triggers full job run instead of repairing failed tasks

2 participants